Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented May 10, 2025

Fixes #13414

Models accepting dynamic resolution (Pixtral / Mistral Small / Qwen VL), we want to:

  • Have a max resolution. If image is bigger than the max res, it will be downscaled
  • Do a warm up with a more reasonable resolution instead of the max res, otherwise many users will get OOM (tbh I'm not quite sure if this is the best solution, but let's try this and also add a custom max_image_size in the future)
  • Resize the image to a multiple of patch_size (or in the case of Qwen VL, must be patch_size * 2)

Btw @ggerganov while working on this, I realized that GGML_PAD only works with multiple power of 2, is this expected?

@ngxson ngxson requested a review from ggerganov May 10, 2025 17:03
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, GGML_PAD works with only powers of 2. Should add a comment to clarify that.

@ngxson ngxson merged commit 15e6125 into ggml-org:master May 10, 2025
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: mtmd in server mode crashes on too big image

2 participants